Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache #98591

Merged
merged 3 commits into from
Nov 30, 2022

Conversation

wjakob
Copy link
Contributor

@wjakob wjakob commented Oct 24, 2022

The typing module internally uses LRU caches to memoize the result of type-related computations. These LRU caches have the potential to introduce difficult-to-find reference leaks spanning multiple extension modules.

Suppose that a binary extension module a leaks a reference to a type a.A with typed function signatures.

The leaked type a.A will induce a secondary refleak of the typing LRU caches, which will at this point cause tertiary leaks in entirely unrelated extension modules. In this way, a benign refleak in any extension can cause difficult-to-debug leaks everywhere else.

This commit fixes this by removing the direct reference from a.A to typing implementation details.

@wjakob
Copy link
Contributor Author

wjakob commented Oct 24, 2022

PS: It would be good to merge this not only in Python 3.12, but also older versions (3.11, 3.10, 3.9, 3.8).

@AlexWaygood
Copy link
Member

PS: It would be good to merge this not only in Python 3.12, but also older versions (3.11, 3.10, 3.9, 3.8).

3.8 and 3.9 are now only accepting security-related patches, so this definitely won't be backported that far, I'm afraid.

@wjakob
Copy link
Contributor Author

wjakob commented Oct 24, 2022

3.8 and 3.9 are now only accepting security-related patches, so this definitely won't be backported that far, I'm afraid.

Fair enough. In that case, I propose to target 3.10, 3.11, and 3.12.

@JelleZijlstra
Copy link
Member

As mentioned in the issue I'm skeptical that this is something we should fix in typing.py. I'll ask for input from other core devs (after the dust from the 3.11 release settles).

@gvanrossum
Copy link
Member

I'm skeptical too. I feel that the tool is incorrectly blaming typing.py.

@wjakob
Copy link
Contributor Author

wjakob commented Oct 24, 2022

@gvanrossum, did you see the discussion in #98253? When I first reported the issue, the actual source of these refleaks was mysterious. Further along the discussion thread, the problem was finally tracked down (see post #98253 (comment))

(Summary: refleaks in extension module tend to hold typed function signatures alive for eternity. This function signature references typing.py internals (LRU caches), which in turn cast a web of references to other parts of the interpreter)

In this way, the caching mechanism in typing.py can turn virtually any refleak in any typed extension module into refleaks elsewhere. Ultimately, it becomes difficult to develop leak-free extensions, because any systematic checking for leaks runs into flukes. I'm creating a framework for extension modules that tries to be well-behaved in this ecosystem, but its internal tests for leaks are hampered by this behavior. Quick tests show that many of the bigger packages (pandas, pytorch, tensorflow) include some refleaks that are problematic in this context.

The patch in this PR is simple and it breaks this problematic chain of references. Of course, one could say: let's leave typing.py as-is and fix all of the other extension modules. I just don't think it is very practical to do so.

@wjakob
Copy link
Contributor Author

wjakob commented Oct 24, 2022

I'm skeptical too. I feel that the tool is incorrectly blaming typing.py.

If by "the tool", you mean the nanobind-based reproducer, please let me know. It would be straightforward to reproduce the leak with a minimal CPython extension.

I can sit down and make such a reproducer, but I would prefer to only do so if it's time well spent (in the sense that there is willingness to fix the issue if convincingly demonstrated).

@Fidget-Spinner Fidget-Spinner changed the title gh-98253: fix refleak issue in typing.py gh-98253: Break potential reference cycles in external code caused by typing.py lru_cache Oct 25, 2022
@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Oct 25, 2022

Speaking as a typing maintainer,

Theoretically, I would like to say that the onus is on the external code to fix their refleaks. Since in this case as you have pointed out, typing.py doesn't have any refleaks. It just worsens external ones.

However, I'm not opposed to the change (+0.5), because there aren't that many more lines of code. So the maintenance burden to support something like this isn't large.

Also as you have rightly pointed out, it's nearly impossible to get all extension modules to fix all their refleaks. So I'm satisfied with typing not making them worse.

@gvanrossum
Copy link
Member

@wjakob Never mind. I trust @Fidget-Spinner here.

@Fidget-Spinner Fidget-Spinner changed the title gh-98253: Break potential reference cycles in external code caused by typing.py lru_cache gh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache Oct 26, 2022
@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Oct 26, 2022

static global state in extension modules is bad and should be avoided as much. See https://peps.python.org/pep-0489/
They are bad for multiple reasons not to mention they are a problem for sub-interpreter. I worry if we add workarounds for this in stdlib then library authors will never fix their own code and we will get more requests to add workarounds so that they don't have to fix their own code. Note that just clearing reference isn't enough here, when the GC is involved this can lead to very strange bugs and are a pain to debug. See a similar asyncio bug #96232 involving static c globals. For this specific case, this seems fine but we need to be cautious because if we add this then the extension authors will never notice that there is a bug in their code as it will probably "just work".

P.S I am not a typing maintainer so I'll defer to @Fidget-Spinner.

@wjakob
Copy link
Contributor Author

wjakob commented Oct 26, 2022

There are a few misconceptions in the above message, I'll try to clarify.

static global state in extension modules is bad and should be avoided as much. [..] I worry if we add workarounds for this in stdlib then library authors will never fix their own code and we will get more requests to add workarounds so that they don't have to fix their own code.

While this PR does introduce a static state variable, it represents state that is already static (the caches). FWIW there is a _cleanup variable just a few lines above that serves the exact same purpose, so it does not seem like new ground is being broken.

[..] if we add this then the extension authors will never notice that there is a bug in their code as it will probably "just work".

This is already what is happening. The leaks in question don't cause issues for those respective packages.

The problem is that the internal mechanics of the typing module spread these leaks to other modules, which is where it becomes problematic. As mentioned in the associated issue, expecting all other extensions to be fixed is is going to be an impossible game of whack-a-mole.

@kumaraditya303
Copy link
Contributor

While this PR does introduce a static state variable, it represents state that is already static (the caches). FWIW there is a _cleanup variable just a few lines above that serves the exact same purpose, so it does not seem like new ground is being broken.

_cleanups has a different purpose, it is used by our test runner to check for reference leaks by clearing caches and it not used outside of that purpose.

for f in typing._cleanups:
f()

@gvanrossum
Copy link
Member

While this PR does introduce a static state variable

There seems to be a miscommunication here. When Kumar wrote "static global state in extension modules is bad" he meant C code (maybe referring to the bug you found in some extension?). This has nothing to do with static state in a .py module like referenced by this PR. So let's please all calm down.

@wjakob
Copy link
Contributor Author

wjakob commented Oct 31, 2022

@rhettinger : I added a longer paragraph summarizing the issue and incorporated your feedback.

@AlexWaygood
Copy link
Member

Please don't force-push to CPython PRs after review/discussion has started. It interacts badly with GitHub's UI, making it harder to see what exactly changed since we last looked at the PR :)

@wjakob
Copy link
Contributor Author

wjakob commented Oct 31, 2022

Sorry 😇. For posterity, I amended the commit to replace the contents of the file Misc/NEWS.d/next/Library/2022-10-24-11-01-05.gh-issue-98253.HVd5v4.rst.


@functools.wraps(func)
def inner(*args, **kwds):
try:
return cached(*args, **kwds)
return _caches[func](*args, **kwds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be marginally slower because instead of a single cell variable lookup, we do a global lookup, a cell variable, and a subscript. Probably not enough to matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we don't backport this: this should still be potentially faster in 3.12 than in 3.11.

@wjakob
Copy link
Contributor Author

wjakob commented Nov 23, 2022

Dear all,

It seems to me that the decision process regarding this PR has stalled. It would be good to come to a decision on how to proceed (accept, reject, further changes requested, ...?)

Thanks in advance!
Wenzel

@Fidget-Spinner
Copy link
Member

I will accept it as an enhancement to typing soon (ie it wont be backported). Sorry for the wait, I was busy with uni exams.

@Fidget-Spinner
Copy link
Member

@serhiy-storchaka or @JelleZijlstra does the code look okay to y'all? I plan to merge this soon.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this to go in, but I think a short comment wouldn't go amiss :) there's already a lot of code in typing.py that's quite subtle

Lib/typing.py Outdated Show resolved Hide resolved
Lib/typing.py Show resolved Hide resolved
@wjakob
Copy link
Contributor Author

wjakob commented Nov 25, 2022

It's quite puzzling -- I pushed two further commits to address your feedback, but they don't show up here. Perhaps a GitHub service issue/outage? See main...wjakob:cpython:typing-leak

@AlexWaygood
Copy link
Member

It's quite puzzling -- I pushed two further commits to address your feedback, but they don't show up here. Perhaps a GitHub service issue/outage? See main...wjakob:cpython:typing-leak

GitHub is quite laggy right now in general, I'm seeing it on typeshed and typing_extensions as well :)

@JelleZijlstra
Copy link
Member

Yes, GitHub is apparently having some issues; I saw it on other PRs. I did get emails for the two commits you pushed.

@wjakob
Copy link
Contributor Author

wjakob commented Nov 25, 2022

Edit: now, something is happening.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy for this to be merged (but will wait for at least one other approval before merging)

@JelleZijlstra
Copy link
Member

https://www.githubstatus.com/ reports some degradation.

@AlexWaygood AlexWaygood merged commit c314198 into python:main Nov 30, 2022
carljm added a commit to carljm/cpython that referenced this pull request Dec 1, 2022
* main: (112 commits)
  pythongh-99894: Ensure the local names don't collide with the test file in traceback suggestion error checking (python#99895)
  pythongh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (pythonGH-99613)
  Doc: Add summary line to isolation_level & autocommit sqlite3.connect params (python#99917)
  pythonGH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (pythonGH-98916)
  pythongh-89189: More compact range iterator (pythonGH-27986)
  bpo-47220: Document the optional callback parameter of weakref.WeakMethod (pythonGH-25491)
  pythonGH-99905: Fix output of misses in summarize_stats.py execution counts (pythonGH-99906)
  pythongh-99845: PEP 670: Convert PyObject macros to functions (python#99850)
  pythongh-99845: Use size_t type in __sizeof__() methods (python#99846)
  pythonGH-99877)
  Fix typo in exception message in `multiprocessing.pool` (python#99900)
  pythongh-87092: move all localsplus preparation into separate function called from assembler stage (pythonGH-99869)
  pythongh-99891: Fix infinite recursion in the tokenizer when showing warnings (pythonGH-99893)
  pythongh-99824: Document that sqlite3.connect implicitly open a transaction if autocommit=False (python#99825)
  pythonGH-81057: remove static state from suggestions.c (python#99411)
  Improve zip64 limit error message (python#95892)
  pythongh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache (python#98591)
  pythongh-99127: Allow some features of syslog to the main interpreter only (pythongh-99128)
  pythongh-82836: fix private network check (python#97733)
  Docs: improve accuracy of socketserver reference (python#24767)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-typing type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants